Reinforcement learning is used to learn complex time and situation behaviours, when there is eiether no training data or it is insufficient. Often the poitn ar which rewards occur are well after the action that caused them and may be the result of several past actions leading toe credit assignment problems, which make positive or negative reinforcement hard. Learning takes place due to interactions with a (real ir simukated) world and therfore have a cost both directly due to the action being perfroemd (energy expenditure ofr a robot, network costs for a web agent) and indirecly due to the posutuve or negative effects of the action. However, without taking actions there is no potential for learning, this leads to an exploration-exploitation trade-off.
Used in Chap. 6: page 92; Chap. 16: pages 255, 260, 261, 267, 268; Chap. 22: page 374
Also known as reinforcement function, reinforcement learner